Robust Classification Under Sample Selection Bias

نویسندگان

  • Anqi Liu
  • Brian D. Ziebart
چکیده

In many important machine learning applications, the source distribution used to estimate a probabilistic classifier differs from the target distribution on which the classifier will be used to make predictions. Due to its asymptotic properties, sample reweighted empirical loss minimization is a commonly employed technique to deal with this difference. However, given finite amounts of labeled source data, this technique suffers from significant estimation errors in settings with large sample selection bias. We develop a framework for learning a robust bias-aware (RBA) probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation. Our approach requires only accurate estimates of statistics under the source distribution and is otherwise as robust as possible to unknown properties of the conditional label distribution, except when explicit generalization assumptions are incorporated. We demonstrate the behavior and effectiveness of our approach on binary classification tasks.

منابع مشابه

Semiparametric Efficient and Robust Estimation of an Unknown Symmetric Population Under Arbitrary Sample Selection Bias

We propose semiparametric methods to estimate the center and shape of a symmetric population when a representative sample of the population is unavailable due to selection bias. We allow an arbitrary sample selection mechanism determined by the data collection procedure, and we do not impose any parametric form on the population distribution. Under this general framework, we construct a family ...

متن کامل

Robust Covariate Shift Regression

In many learning settings, the source data available to train a regression model differs from the target data it encounters when making predictions due to input distribution shift. Appropriately dealing with this situation remains an important challenge. Existing methods attempt to “reweight” the source data samples to better represent the target domain, but this introduces strong inductive bia...

متن کامل

Bias and MSE Analysis of the IV Estimator Under Weak Identification with Application to Bias Correction∗

We provide results on properties of the IV estimator in the presence of weak instruments, beginning with the derivation of analytical formulae for the asymptotic bias (ABIAS) and mean squared error (AMSE). We also obtain approximations for the ABIAS and AMSE formulae based on an asymptotic scheme; which, loosely speaking, requires the expectation of the first stage F-statistic to converge to a ...

متن کامل

A robust multi-objective global supplier selection model under currency fluctuation and price discount

Robust supplier selection problem, in a scenario-based approach has been proposed, when the demand and exchange rates are subject to uncertainties. First, a deterministic multi-objective mixed integer linear programming is developed; then, the robust counterpart of the proposed mixed integer linear programming is presented using the recent extension in robust optimization theory. We discuss dec...

متن کامل

Model Selection in Classification: the Swapping Method

In this article, the bias of the empirical error rate in supervised classification is studied. The exact formula and a robust estimator of the bias are given. From these results, we propose a new penalized criterion to perform model selection in classification. Applications to simulated and real data are presented.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014